Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
نویسندگان
چکیده
Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation– exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods’ performances. 2007 Elsevier Inc. All rights reserved.
منابع مشابه
Upper Confidence Trees and Billiards for Optimal Active Learning
This paper focuses on Active Learning (AL) with bounded computational resources. AL is formalized as a finite horizon Reinforcement Learning problem, and tackled as a single-player game. An approximate optimal AL strategy based on tree-structured multi-armed bandit algorithms and billiard-based sampling is presented together with a proof of principle of the approach. Motsclés : Apprentissage ac...
متن کاملMarkov Security Games: Learning in Spatial Security Problems
In this paper we present a preliminary investigation of modelling spatial aspects of security games within the context of Markov games. Reinforcement learning is a powerful tool for adaptation in unknown environments, however the basic singleagent RL algorithms are unfit to be applied in adversarial scenarios. Therefore, we profit from Adversarial Multi-Armed Bandit (AMAB) methods which are des...
متن کاملLearning Optimal Parameter Values in Dynamic Environment: An Experiment with Softmax Reinforcement Learning Algorithm
1. Introduction Many learning and heuristic search algorithms require tuning of parameters to achieve optimum performance. In stationary and deterministic problem domains this is usually achieved through off-line sensitivity analysis. However, this method breaks down in non-stationary and non-deterministic environments, where the optimal set of values for the parameters keep changing over time....
متن کاملAction Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O ( (n/2) log(1/δ) ) times to find an 2-optimal arm with probability of at least 1−δ. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise act...
متن کاملRobot Beerpong: Model-Based Learning for Shifting Targets
De ning controls for robot to achieve precise goal-directed movements can be hard when using hand crafted solutions. Reinforcement Learning, particularly policy-search methods provides a promising alternative which has already been successfully used for robot learning. Here the task is learned using a function that rewards desired movements and an algorithm that seeks to maximize the reward. In...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Applied Mathematics and Computation
دوره 196 شماره
صفحات -
تاریخ انتشار 2008